Variable selection for binary classification using error rate p-values applied to metabolomics data
نویسندگان
چکیده
منابع مشابه
Improving the stability of wrapper variable selection applied to binary classification
Wrapper variable selection methods are widely adopted in many applications, among which the design of classifiers. The main problem related to these approaches regards the stability of the selection, namely the exploitation of different training data set can lead to the selection of different variable subsets. This problem is particularly critical in applications where variable selection is use...
متن کاملBayesian variable selection for disease classification using gene expression data
MOTIVATION An important application of gene expression microarray data is the classification of samples into categories. Accurate classification depends upon the method used to identify the most relevant genes. Owing to the large number of genes and relatively small sample size, the selection process can be unstable. Modification of existing methods for achieving better analysis of microarray d...
متن کاملFeature selection using genetic algorithm for classification of schizophrenia using fMRI data
In this paper we propose a new method for classification of subjects into schizophrenia and control groups using functional magnetic resonance imaging (fMRI) data. In the preprocessing step, the number of fMRI time points is reduced using principal component analysis (PCA). Then, independent component analysis (ICA) is used for further data analysis. It estimates independent components (ICs) of...
متن کاملP-values for classification
Let (X,Y ) be a random variable consisting of an observed feature vector X ∈ X and an unobserved class label Y ∈ {1, 2, . . . , L} with unknown joint distribution. In addition, let D be a training data set consisting of n completely observed independent copies of (X,Y ). Usual classification procedures provide point predictors (classifiers) Ŷ (X,D) of Y or estimate the conditional distribution ...
متن کاملClassification and Biomarker Genes Selection for Cancer Gene Expression Data Using Random Forest
Background & objective: Microarray and next generation sequencing (NGS) data are the important sources to find helpful molecular patterns. Also, the great number of gene expression data increases the challenge of how to identify the biomarkers associated with cancer. The random forest (RF) is used to effectively analyze the problems of large-p and smal...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: BMC Bioinformatics
سال: 2016
ISSN: 1471-2105
DOI: 10.1186/s12859-015-0867-7